Going to and succeeding in college involves a lot of different factors, from enrollment size to the cost of tuition. We will explore how these factors are related to other factors, such as graduation rate and the starting salary of gradutes.
The following dataset contains information about an array of US Universities based on National Ranking. The most important piece of information this dataset provides is the ranking of each university. There are often many assumptions made based on the ranking of a school, such as average starting salary, average tuition, and graduation rate. By having information such as tuition rate, enrollments, location, and median starting salary of alumni, we will be able to test whether there is an actual correlation between ranking and these assumptions. We will also be able to come up with our own predictions and test whether we will be able to predict information such as starting salary based on the provided information about each undergraduate institution. A couple other questions we would like to answer are as follows: What is the most important factor when predicting starting salary for undergraduate institutions? Is the cost of undergraduate tuition a key factor? Does undergraduate salary correlated with the male to female ratio at the institution in any kind of way? Which variables can be used to predict the cost of tuition? Which predictors are influential in this? Can we predict graduation rate?
You will use R and the following libraries: -ggplot2 -rvest -tidyverse -stringr
You will use R and the following libraries: -ggplot2 -rvest -tidyverse -stringr
The main url we will be using contains very limited information about each of the schools, such as ranking and tuition, therefore the first step that needs to be taken to be able to gain all the information we need to be able to analyze the data and make predictions is to parse the information into readable data. Detailed information about each school is spread across multiple websites so we will need to retrieve the proper url for each university from the US News website containing the ranking and then parse important information into tables that could be used for data analysis.
We are scraping the data of 100 schools from https://www.usnews.com/best-colleges/rankings/national-universities. The data we have is stored in a text file since it loads on the page in increments. We parse the data to find the URL for each college’s informational page.
Note: The information for the University of California–Davis was removed from the dataset because it didn’t contain median alumni salary, which plays a large role in our analysis.
Note: Room and Board, Tuition and Fees, and Median Alumni Salary are all in thousands of dollars.
library(rvest)
library(tidyverse)
url <-"html_top100.txt"
college_urls <- url %>%
read_html() %>%
html_node("body") %>% html_nodes("ol[class~=bEyEue]") %>% html_nodes("li[id]")%>% html_nodes("h3") %>%
html_nodes("a[href]") %>%
html_attr("href")
head(college_urls)
## [1] "/best-colleges/princeton-university-2627"
## [2] "/best-colleges/harvard-university-2155"
## [3] "/best-colleges/columbia-university-2707"
## [4] "/best-colleges/massachusetts-institute-of-technology-2178"
## [5] "/best-colleges/university-of-chicago-1774"
## [6] "/best-colleges/yale-university-1426"
A data frame is created to store the information of each college in rows. Columns are initialized.
index_num <- 0
college_tab_1 <- data.frame("URL" = gsub(" ", "", paste("https://www.usnews.com",college_urls, sep = "")),
"CollegeName"= "", "TuitionFeesThousands" = 0, "RoomBoardThousands" = 0, "TotalEnrollment" = 0, "SchoolType" = "", "YearFounded" = 0, "Setting" = "", "Endowment2017Millions" = 0, "MedianStartingSalaryOfAlumniThousands" = 0, "Selectivity" = "", "Fall2017AcceptanceRate" = 0, "MalePercentage" = 0, "FourYearGraduationRate" = 0, stringsAsFactors = FALSE)
#removing one college that doesn't have a median starting salary, for data uniformity
college_tab_1 <- college_tab_1[-c(40),]
head(college_tab_1)
## URL
## 1 https://www.usnews.com/best-colleges/princeton-university-2627
## 2 https://www.usnews.com/best-colleges/harvard-university-2155
## 3 https://www.usnews.com/best-colleges/columbia-university-2707
## 4 https://www.usnews.com/best-colleges/massachusetts-institute-of-technology-2178
## 5 https://www.usnews.com/best-colleges/university-of-chicago-1774
## 6 https://www.usnews.com/best-colleges/yale-university-1426
## CollegeName TuitionFeesThousands RoomBoardThousands TotalEnrollment
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
## SchoolType YearFounded Setting Endowment2017Millions
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
## MedianStartingSalaryOfAlumniThousands Selectivity Fall2017AcceptanceRate
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
## MalePercentage FourYearGraduationRate
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
Below are functions used to obtain data from the website and parse it.
#retrieves of vector of size three containing the Tuition&Fees, Room&Board, and total enrollment
get_info <- function(url_html){
attr <- url_html %>% html_node("body") %>% html_nodes("div[id~=content-main]") %>%
html_nodes("section[class~=hero-stats-widget-stats]") %>%
html_nodes("ul") %>% html_nodes("li") %>% html_nodes("strong")
}
#takes in a vector and index, and parses that information to a double
#ex: $47,263 -> 47263.0
get_tuition_rm <- function(info, num){
a_1 <- info[num] %>% html_text()
tuition_rm <-
as.double(paste(substring(a_1, 2, str_locate(a_1, ",")[1] - 1), substring(a_1, str_locate(a_1, ",")[1] + 1, str_locate(a_1, " ")[1] - 1), sep=""))
tuition_rm / 1000.0
}
#takes in a vector and parses the total enrollment information to a double
get_enrollment <- function(info){
a_1 <- info[3] %>% html_text()
as.double(paste(substring(a_1, 1, str_locate(a_1, ",")[1] - 1), substring(a_1, str_locate(a_1, ",")[1] + 1), sep=""))
}
#gets the percentage of the majority gender at a certain university
get_percent <- function(url_html){
attr <- url_html %>% html_node("body") %>% html_nodes("div[id~=content-main]") %>%
html_nodes("div[class~=block-normal]") %>% html_nodes("span[class~=distribution-breakdown__percentage]") %>% html_text()
as.double(substring(attr, 1, str_locate(attr, "%")[1] - 1)) / 100.0
}
#retrieves the gender of the majority sex and parses the percentage to be in terms of males
get_gender_ratio <- function(url_html){
attr <- url_html %>% html_node("body") %>% html_nodes("div[id~=content-main]") %>%
html_nodes("div[class~=block-normal]") %>% html_nodes("span[class~=distribution-breakdown__percentage-copy]") %>% html_text()
attr <- sub("\n ","",attr)
attr <- sub("\n ","",attr)
if (attr == "Female"){
1 - get_percent(url_html)
}else{
get_percent(url_html)
}
}
Here, we use both the functions above and the html_node function to fill out the table.
college_tab <- college_tab_1
for (i in 1:nrow(college_tab)){
url_html <- college_tab[i,1] %>%read_html()
college_tab[i,]$CollegeName <- url_html %>% html_node("body") %>% html_nodes("h1[class~=hero-heading]") %>% html_text()
priv_tuition <- url_html %>% html_node("body") %>% html_nodes("span[data-test-id~=v_private_tuition]") %>% html_text()
college_tab[i,]$TuitionFeesThousands <- ifelse(length(priv_tuition) > 0, priv_tuition,
url_html %>% html_node("body") %>% html_node("span[data-test-id~=v_out_state_tuition]") %>% html_text())
college_tab[i,]$RoomBoardThousands <- url_html %>% html_node("body") %>% html_node("span[data-test-id~=w_room_board]") %>% html_text()
college_tab[i,]$TotalEnrollment <- url_html %>% html_node("body") %>% html_node("span[data-test-id~=total_all_students]") %>% html_text()
college_tab[i,]$MalePercentage <- get_gender_ratio(url_html)
college_tab[i,]$Fall2017AcceptanceRate <- url_html %>% html_node("span[data-test-id~=r_c_accept_rate]") %>% html_text()
college_tab[i,]$Selectivity <- url_html %>% html_node("span[data-test-id~=c_select_class]") %>% html_text()
college_tab[i,]$FourYearGraduationRate <- url_html %>% html_node("span[data-test-id~=grad_rate_4_year]") %>% html_text()
college_tab[i,]$MedianStartingSalaryOfAlumniThousands <- url_html %>% html_nodes("div[data-field-id=averageStartSalary]") %>%html_node("span[data-test-id]") %>% html_text()
temp_vector <- url_html %>% html_node("body") %>% html_nodes("div[id~=content-main]") %>%html_nodes("div[class~=flex-row]") %>% html_nodes("span[class~=heading-small]") %>% html_text()
college_tab[i,]$SchoolType <- temp_vector[1]
college_tab[i,]$YearFounded <- temp_vector[2]
college_tab[i,]$Setting <- temp_vector[5]
college_tab[i,]$Endowment2017Millions <- temp_vector[6]
}
head(college_tab)
## URL
## 1 https://www.usnews.com/best-colleges/princeton-university-2627
## 2 https://www.usnews.com/best-colleges/harvard-university-2155
## 3 https://www.usnews.com/best-colleges/columbia-university-2707
## 4 https://www.usnews.com/best-colleges/massachusetts-institute-of-technology-2178
## 5 https://www.usnews.com/best-colleges/university-of-chicago-1774
## 6 https://www.usnews.com/best-colleges/yale-university-1426
## CollegeName
## 1 \n Princeton University\n
## 2 \n Harvard University\n
## 3 \n Columbia University\n
## 4 \n Massachusetts Institute of Technology\n
## 5 \n University of Chicago\n
## 6 \n Yale University\n
## TuitionFeesThousands RoomBoardThousands
## 1 \n $47,140 (2018-19) \n $15,610 (2018-19)
## 2 \n $50,420 (2018-19) \n $17,160 (2018-19)
## 3 \n $59,430 (2018-19) \n $14,016 (2018-19)
## 4 \n $51,832 (2018-19) \n $15,510 (2018-19)
## 5 \n $57,006 (2018-19) \n $16,350 (2018-19)
## 6 \n $53,430 (2018-19) \n $16,000 (2018-19)
## TotalEnrollment SchoolType YearFounded Setting
## 1 \n 8,273 Private, Coed 1746 Suburban
## 2 \n 20,604 Private, Coed 1636 Urban
## 3 \n 25,968 Private, Coed 1754 Urban
## 4 \n 11,466 Private, Coed 1861 Urban
## 5 \n 13,736 Private, Coed 1890 Urban
## 6 \n 12,974 Private, Coed 1701 City
## Endowment2017Millions MedianStartingSalaryOfAlumniThousands
## 1 $23.4 billion \n $68,400*
## 2 $37.1 billion \n $66,500*
## 3 $10.0 billion \n $64,900*
## 4 $14.8 billion + \n $79,800*
## 5 $6.6 billion + \n $57,700*
## 6 $27.2 billion + \n $63,200*
## Selectivity Fall2017AcceptanceRate MalePercentage
## 1 \n Most selective \n 6% 0.51
## 2 \n Most selective \n 5% 0.52
## 3 \n Most selective \n 6% 0.52
## 4 \n Most selective \n 7% 0.54
## 5 \n Most selective \n 9% 0.51
## 6 \n Most selective \n 7% 0.50
## FourYearGraduationRate
## 1 \n 89%
## 2 \n 84%
## 3 \n 88%
## 4 \n 85%
## 5 \n 88%
## 6 \n 87%
Below, we reformat many of the columns to get usable data. Each column is categorized into the appropriate type of data.
formatted_college_tab <- college_tab
#fix type of School Type, Setting, Year Founded
formatted_college_tab$SchoolType <- as.factor(formatted_college_tab$SchoolType)
formatted_college_tab$Setting <- as.factor(formatted_college_tab$Setting)
formatted_college_tab$YearFounded <- as.integer(formatted_college_tab$YearFounded)
#fix Endowment2017 formatting
formatted_college_tab$Endowment2017Millions <- ifelse(grepl("billion", formatted_college_tab$Endowment2017Millions ), sub("\\.","",formatted_college_tab$Endowment2017Millions ),formatted_college_tab$Endowment2017Millions )
formatted_college_tab$Endowment2017Millions <-sub(" billion","00",formatted_college_tab$Endowment2017Millions )
formatted_college_tab$Endowment2017Millions <-sub(" million","",formatted_college_tab$Endowment2017Millions )
formatted_college_tab$Endowment2017Millions <-sub("[[:punct:]]", "",formatted_college_tab$Endowment2017Millions )
formatted_college_tab$Endowment2017Millions <-sub("\\$", "",formatted_college_tab$Endowment2017Millions )
formatted_college_tab$Endowment2017Millions <-sub(" \\+", "",formatted_college_tab$Endowment2017Millions )
formatted_college_tab$Endowment2017Millions <- as.double(formatted_college_tab$Endowment2017Millions)
#fix College Name formatting
formatted_college_tab$CollegeName <- sub("^\n ","",formatted_college_tab$CollegeName)
formatted_college_tab$CollegeName <-sub("\n ","",formatted_college_tab$CollegeName)
#fixing Acceptance Rate formatting
formatted_college_tab$Fall2017AcceptanceRate <- sub("\n ","",formatted_college_tab$Fall2017AcceptanceRate)
formatted_college_tab$Fall2017AcceptanceRate <- sub("%","",formatted_college_tab$Fall2017AcceptanceRate)
formatted_college_tab$Fall2017AcceptanceRate <- as.double(formatted_college_tab$Fall2017AcceptanceRate)
formatted_college_tab$Fall2017AcceptanceRate <- formatted_college_tab$Fall2017AcceptanceRate/100
#fixing Grad Rate formatting
formatted_college_tab$FourYearGraduationRate <- sub("\n ","",formatted_college_tab$FourYearGraduationRate)
formatted_college_tab$FourYearGraduationRate <- sub("%","",formatted_college_tab$FourYearGraduationRate)
formatted_college_tab$FourYearGraduationRate <- as.double(formatted_college_tab$FourYearGraduationRate)
formatted_college_tab$FourYearGraduationRate <- formatted_college_tab$FourYearGraduationRate/100
#fixing Salary formatting
formatted_college_tab$MedianStartingSalaryOfAlumniThousands <-
sub("\n ","",formatted_college_tab$MedianStartingSalaryOfAlumniThousands)
formatted_college_tab$MedianStartingSalaryOfAlumniThousands <- gsub("\\*","",formatted_college_tab$MedianStartingSalaryOfAlumniThousands)
formatted_college_tab$MedianStartingSalaryOfAlumniThousands <- gsub("\\$","",formatted_college_tab$MedianStartingSalaryOfAlumniThousands)
formatted_college_tab$MedianStartingSalaryOfAlumniThousands <- gsub("\\,","",formatted_college_tab$MedianStartingSalaryOfAlumniThousands)
formatted_college_tab$MedianStartingSalaryOfAlumniThousands <- as.double(formatted_college_tab$MedianStartingSalaryOfAlumniThousands)/1000
#fixing Selectivity formatting
formatted_college_tab$Selectivity <- sub("\n ","",formatted_college_tab$Selectivity)
formatted_college_tab$Selectivity <- as.factor(formatted_college_tab$Selectivity)
#fixing Tuition formatting
formatted_college_tab$TuitionFeesThousands <- sub("\n ", "",formatted_college_tab$TuitionFeesThousands )
formatted_college_tab$TuitionFeesThousands <- sub(" \\(2018-19\\)", "",formatted_college_tab$TuitionFeesThousands )
formatted_college_tab$TuitionFeesThousands <-sub("\\,", "",formatted_college_tab$TuitionFeesThousands )
formatted_college_tab$TuitionFeesThousands <-sub("\\$", "",formatted_college_tab$TuitionFeesThousands )
formatted_college_tab$TuitionFeesThousands <- as.double(formatted_college_tab$TuitionFeesThousands)/1000
## Warning: NAs introduced by coercion
#fixing RoomBoard formatting
formatted_college_tab$RoomBoardThousands <- sub("\n ", "",formatted_college_tab$RoomBoardThousands )
formatted_college_tab$RoomBoardThousands <- sub(" \\(2018-19\\)", "",formatted_college_tab$RoomBoardThousands )
formatted_college_tab$RoomBoardThousands <-sub("\\,", "",formatted_college_tab$RoomBoardThousands )
formatted_college_tab$RoomBoardThousands <-sub("\\$", "",formatted_college_tab$RoomBoardThousands )
formatted_college_tab$RoomBoardThousands <- as.double(formatted_college_tab$RoomBoardThousands)/1000
## Warning: NAs introduced by coercion
#fixing Enrollment formatting
formatted_college_tab$TotalEnrollment <- sub("\n ", "",formatted_college_tab$TotalEnrollment )
formatted_college_tab$TotalEnrollment <-sub("\\,", "",formatted_college_tab$TotalEnrollment )
formatted_college_tab$TotalEnrollment <- as.double(formatted_college_tab$TotalEnrollment)
formatted_college_tab <- formatted_college_tab %>% mutate(TotalCostThousands =TuitionFeesThousands + RoomBoardThousands )
formatted_college_tab <- na.omit(formatted_college_tab)
nrow(formatted_college_tab)
## [1] 107
as.tibble(formatted_college_tab)
## Warning: `as.tibble()` is deprecated, use `as_tibble()` (but mind the new semantics).
## This warning is displayed once per session.
## # A tibble: 107 x 15
## URL CollegeName TuitionFeesThou… RoomBoardThousa… TotalEnrollment
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 http… Princeton … 47.1 15.6 8273
## 2 http… Harvard Un… 50.4 17.2 20604
## 3 http… Columbia U… 59.4 14.0 25968
## 4 http… Massachuse… 51.8 15.5 11466
## 5 http… University… 57.0 16.4 13736
## 6 http… Yale Unive… 53.4 16 12974
## 7 http… Stanford U… 51.4 15.8 17178
## 8 http… Duke Unive… 56.0 15.9 16294
## 9 http… University… 55.6 15.6 21907
## 10 http… Johns Hopk… 53.7 15.8 25151
## # … with 97 more rows, and 10 more variables: SchoolType <fct>,
## # YearFounded <int>, Setting <fct>, Endowment2017Millions <dbl>,
## # MedianStartingSalaryOfAlumniThousands <dbl>, Selectivity <fct>,
## # Fall2017AcceptanceRate <dbl>, MalePercentage <dbl>,
## # FourYearGraduationRate <dbl>, TotalCostThousands <dbl>
#to save as csv to easily work on it without having to reload
write.csv(formatted_college_tab, file = "college_info.csv")
formatted_college_tab <- read.csv("college_info.csv")
formatted_college_tab <- formatted_college_tab[,-c(1)]
<<<<<<< HEAD
as.tibble(formatted_college_tab)
## # A tibble: 107 x 15
## URL CollegeName TuitionFeesThou… RoomBoardThousa… TotalEnrollment
## <fct> <fct> <dbl> <dbl> <int>
## 1 http… Princeton … 47.1 15.6 8273
## 2 http… Harvard Un… 50.4 17.2 20604
## 3 http… Columbia U… 59.4 14.0 25968
## 4 http… Massachuse… 51.8 15.5 11466
## 5 http… University… 57.0 16.4 13736
## 6 http… Yale Unive… 53.4 16 12974
## 7 http… Stanford U… 51.4 15.8 17178
## 8 http… Duke Unive… 56.0 15.9 16294
## 9 http… University… 55.6 15.6 21907
## 10 http… Johns Hopk… 53.7 15.8 25151
## # … with 97 more rows, and 10 more variables: SchoolType <fct>,
## # YearFounded <int>, Setting <fct>, Endowment2017Millions <dbl>,
## # MedianStartingSalaryOfAlumniThousands <dbl>, Selectivity <fct>,
## # Fall2017AcceptanceRate <dbl>, MalePercentage <dbl>,
## # FourYearGraduationRate <dbl>, TotalCostThousands <dbl>
=======
formatted_college_tab
## URL
## 1 https://www.usnews.com/best-colleges/princeton-university-2627
## 2 https://www.usnews.com/best-colleges/harvard-university-2155
## 3 https://www.usnews.com/best-colleges/columbia-university-2707
## 4 https://www.usnews.com/best-colleges/massachusetts-institute-of-technology-2178
## 5 https://www.usnews.com/best-colleges/university-of-chicago-1774
## 6 https://www.usnews.com/best-colleges/yale-university-1426
## 7 https://www.usnews.com/best-colleges/stanford-university-1305
## 8 https://www.usnews.com/best-colleges/duke-university-2920
## 9 https://www.usnews.com/best-colleges/university-of-pennsylvania-3378
## 10 https://www.usnews.com/best-colleges/jhu-2077
## 11 https://www.usnews.com/best-colleges/northwestern-university-1739
## 12 https://www.usnews.com/best-colleges/california-institute-of-technology-1131
## 13 https://www.usnews.com/best-colleges/dartmouth-college-2573
## 14 https://www.usnews.com/best-colleges/brown-university-3401
## 15 https://www.usnews.com/best-colleges/vanderbilt-3535
## 16 https://www.usnews.com/best-colleges/cornell-university-2711
## 17 https://www.usnews.com/best-colleges/rice-3604
## 18 https://www.usnews.com/best-colleges/university-of-notre-dame-1840
## 19 https://www.usnews.com/best-colleges/university-of-california-los-angeles-1315
## 20 https://www.usnews.com/best-colleges/washington-university-in-st-louis-2520
## 21 https://www.usnews.com/best-colleges/emory-university-1564
## 22 https://www.usnews.com/best-colleges/georgetown-university-1445
## 23 https://www.usnews.com/best-colleges/university-of-california-berkeley-1312
## 24 https://www.usnews.com/best-colleges/university-of-southern-california-1328
## 25 https://www.usnews.com/best-colleges/carnegie-mellon-university-3242
## 26 https://www.usnews.com/best-colleges/uva-6968
## 27 https://www.usnews.com/best-colleges/tufts-university-2219
## 28 https://www.usnews.com/best-colleges/university-of-michigan-ann-arbor-9092
## 29 https://www.usnews.com/best-colleges/wake-forest-2978
## 30 https://www.usnews.com/best-colleges/nyu-2785
## 31 https://www.usnews.com/best-colleges/university-of-california-santa-barbara-1320
## 32 https://www.usnews.com/best-colleges/unc-2974
## 33 https://www.usnews.com/best-colleges/university-of-rochester-2894
## 34 https://www.usnews.com/best-colleges/brandeis-university-2133
## 35 https://www.usnews.com/best-colleges/georgia-institute-of-technology-1569
## 36 https://www.usnews.com/best-colleges/university-of-florida-1535
## 37 https://www.usnews.com/best-colleges/boston-college-2128
## 38 https://www.usnews.com/best-colleges/william-and-mary-3705
## 39 https://www.usnews.com/best-colleges/university-of-california-san-diego-1317
## 40 https://www.usnews.com/best-colleges/boston-university-2130
## 41 https://www.usnews.com/best-colleges/case-western-reserve-university-3024
## 42 https://www.usnews.com/best-colleges/northeastern-university-2199
## 43 https://www.usnews.com/best-colleges/tulane-university-2029
## 44 https://www.usnews.com/best-colleges/pepperdine-university-1264
## 45 https://www.usnews.com/best-colleges/university-of-georgia-1598
## 46 https://www.usnews.com/best-colleges/university-of-illinois-urbanachampaign-1775
## 47 https://www.usnews.com/best-colleges/rpi-2803
## 48 https://www.usnews.com/best-colleges/university-of-texas-3658
## 49 https://www.usnews.com/best-colleges/university-of-wisconsin-3895
## 50 https://www.usnews.com/best-colleges/villanova-3388
## 51 https://www.usnews.com/best-colleges/lehigh-university-3289
## 52 https://www.usnews.com/best-colleges/syracuse-university-2882
## 53 https://www.usnews.com/best-colleges/university-of-miami-1536
## 54 https://www.usnews.com/best-colleges/ohio-state-6883
## 55 https://www.usnews.com/best-colleges/purdue-university-west-lafayette-1825
## 56 https://www.usnews.com/best-colleges/rutgers-new-brunswick-6964
## 57 https://www.usnews.com/best-colleges/penn-state-6965
## 58 https://www.usnews.com/best-colleges/smu-3613
## 59 https://www.usnews.com/best-colleges/university-of-washington-3798
## 60 https://www.usnews.com/best-colleges/wpi-2233
## 61 https://www.usnews.com/best-colleges/george-washington-university-1444
## 62 https://www.usnews.com/best-colleges/uconn-29013
## 63 https://www.usnews.com/best-colleges/university-of-maryland-2103
## 64 https://www.usnews.com/best-colleges/byu-3670
## 65 https://www.usnews.com/best-colleges/clark-university-massachusetts-2139
## 66 https://www.usnews.com/best-colleges/clemson-university-3425
## 67 https://www.usnews.com/best-colleges/texas-am-university-college-station-10366
## 68 https://www.usnews.com/best-colleges/florida-state-university-1489
## 69 https://www.usnews.com/best-colleges/fordham-university-2722
## 70 https://www.usnews.com/best-colleges/stevens-institute-of-technology-2639
## 71 https://www.usnews.com/best-colleges/university-of-california-santa-cruz-1321
## 72 https://www.usnews.com/best-colleges/umass-amherst-2221
## 73 https://www.usnews.com/best-colleges/university-of-pittsburgh-3379
## 74 https://www.usnews.com/best-colleges/university-of-minnesota-twin-cities-3969
## 75 https://www.usnews.com/best-colleges/virginia-tech-3754
## 76 https://www.usnews.com/best-colleges/american-university-1434
## 77 https://www.usnews.com/best-colleges/baylor-university-6967
## 78 https://www.usnews.com/best-colleges/suny-binghamton-2836
## 79 https://www.usnews.com/best-colleges/colorado-school-of-mines-1348
## 80 https://www.usnews.com/best-colleges/north-carolina-state-raleigh-2972
## 81 https://www.usnews.com/best-colleges/stony-brook-suny-2838
## 82 https://www.usnews.com/best-colleges/tcu-3636
## 83 https://www.usnews.com/best-colleges/yeshiva-university-2903
## 84 https://www.usnews.com/best-colleges/michigan-state-2290
## 85 https://www.usnews.com/best-colleges/university-of-california-riverside-1316
## 86 https://www.usnews.com/best-colleges/university-of-san-diego-10395
## 87 https://www.usnews.com/best-colleges/howard-university-1448
## 88 https://www.usnews.com/best-colleges/indiana-university-bloomington-1809
## 89 https://www.usnews.com/best-colleges/loyola-university-chicago-1710
## 90 https://www.usnews.com/best-colleges/marquette-university-3863
## 91 https://www.usnews.com/best-colleges/ub-9554
## 92 https://www.usnews.com/best-colleges/university-of-delaware-1431
## 93 https://www.usnews.com/best-colleges/university-of-iowa-1892
## 94 https://www.usnews.com/best-colleges/illinois-institute-of-technology-1691
## 95 https://www.usnews.com/best-colleges/miami-university-7104
## 96 https://www.usnews.com/best-colleges/university-of-colorado-boulder-1370
## 97 https://www.usnews.com/best-colleges/university-of-denver-1371
## 98 https://www.usnews.com/best-colleges/university-of-san-francisco-1325
## 99 https://www.usnews.com/best-colleges/university-of-vermont-3696
## 100 https://www.usnews.com/best-colleges/clarkson-university-2699
## 101 https://www.usnews.com/best-colleges/drexel-university-3256
## 102 https://www.usnews.com/best-colleges/rit-2806
## 103 https://www.usnews.com/best-colleges/university-of-oregon-3223
## 104 https://www.usnews.com/best-colleges/njit-2621
## 105 https://www.usnews.com/best-colleges/st-louis-university-2506
## 106 https://www.usnews.com/best-colleges/suny-environmental-science-and-forestry-2851
## 107 https://www.usnews.com/best-colleges/temple-university-3371
## CollegeName
## 1 Princeton University
## 2 Harvard University
## 3 Columbia University
## 4 Massachusetts Institute of Technology
## 5 University of Chicago
## 6 Yale University
## 7 Stanford University
## 8 Duke University
## 9 University of Pennsylvania
## 10 Johns Hopkins University
## 11 Northwestern University
## 12 California Institute of Technology
## 13 Dartmouth College
## 14 Brown University
## 15 Vanderbilt University
## 16 Cornell University
## 17 Rice University
## 18 University of Notre Dame
## 19 University of California--Los Angeles
## 20 Washington University in St. Louis
## 21 Emory University
## 22 Georgetown University
## 23 University of California--Berkeley
## 24 University of Southern California
## 25 Carnegie Mellon University
## 26 University of Virginia
## 27 Tufts University
## 28 University of Michigan--Ann Arbor
## 29 Wake Forest University
## 30 New York University
## 31 University of California--Santa Barbara
## 32 University of North Carolina--Chapel Hill
## 33 University of Rochester
## 34 Brandeis University
## 35 Georgia Institute of Technology
## 36 University of Florida
## 37 Boston College
## 38 College of William and Mary
## 39 University of California--San Diego
## 40 Boston University
## 41 Case Western Reserve University
## 42 Northeastern University
## 43 Tulane University
## 44 Pepperdine University
## 45 University of Georgia
## 46 University of Illinois--Urbana-Champaign
## 47 Rensselaer Polytechnic Institute
## 48 University of Texas--Austin
## 49 University of Wisconsin--Madison
## 50 Villanova University
## 51 Lehigh University
## 52 Syracuse University
## 53 University of Miami
## 54 Ohio State University--Columbus
## 55 Purdue University--West Lafayette
## 56 Rutgers University--New Brunswick
## 57 Pennsylvania State University--University Park
## 58 Southern Methodist University
## 59 University of Washington
## 60 Worcester Polytechnic Institute
## 61 George Washington University
## 62 University of Connecticut
## 63 University of Maryland--College Park
## 64 Brigham Young University--Provo
## 65 Clark University
## 66 Clemson University
## 67 Texas A&M University--College Station
## 68 Florida State University
## 69 Fordham University
## 70 Stevens Institute of Technology
## 71 University of California--Santa Cruz
## 72 University of Massachusetts--Amherst
## 73 University of Pittsburgh
## 74 University of Minnesota--Twin Cities
## 75 Virginia Tech
## 76 American University
## 77 Baylor University
## 78 Binghamton University--SUNY
## 79 Colorado School of Mines
## 80 North Carolina State University--Raleigh
## 81 Stony Brook University--SUNY
## 82 Texas Christian University
## 83 Yeshiva University
## 84 Michigan State University
## 85 University of California--Riverside
## 86 University of San Diego
## 87 Howard University
## 88 Indiana University--Bloomington
## 89 Loyola University Chicago
## 90 Marquette University
## 91 University at Buffalo--SUNY
## 92 University of Delaware
## 93 University of Iowa
## 94 Illinois Institute of Technology
## 95 Miami University--Oxford
## 96 University of Colorado--Boulder
## 97 University of Denver
## 98 University of San Francisco
## 99 University of Vermont
## 100 Clarkson University
## 101 Drexel University
## 102 Rochester Institute of Technology
## 103 University of Oregon
## 104 New Jersey Institute of Technology
## 105 Saint Louis University
## 106 SUNY College of Environmental Science and Forestry
## 107 Temple University
## TuitionFeesThousands RoomBoardThousands TotalEnrollment SchoolType
## 1 47.140 15.610 8273 Private, Coed
## 2 50.420 17.160 20604 Private, Coed
## 3 59.430 14.016 25968 Private, Coed
## 4 51.832 15.510 11466 Private, Coed
## 5 57.006 16.350 13736 Private, Coed
## 6 53.430 16.000 12974 Private, Coed
## 7 51.354 15.763 17178 Private, Coed
## 8 55.960 15.944 16294 Private, Coed
## 9 55.584 15.616 21907 Private, Coed
## 10 53.740 15.836 25151 Private, Coed
## 11 54.567 16.626 21474 Private, Coed
## 12 52.362 15.525 2238 Private, Coed
## 13 55.035 15.756 6509 Private, Coed
## 14 55.656 14.670 10095 Private, Coed
## 15 49.816 16.234 12592 Private, Coed
## 16 55.188 14.816 23016 Private, Coed
## 17 47.350 14.000 7022 Private, Coed
## 18 53.391 15.410 12467 Private, Coed
## 19 41.294 15.991 45428 Public, Coed
## 20 53.399 16.440 15303 Private, Coed
## 21 51.306 14.456 14273 Private, Coed
## 22 54.104 16.418 19005 Private, Coed
## 23 43.232 17.764 41910 Public, Coed
## 24 56.225 15.400 36487 Private, Coed
## 25 55.465 14.418 14528 Private, Coed
## 26 48.891 11.590 24360 Public, Coed
## 27 56.382 14.560 11449 Private, Coed
## 28 49.350 11.534 46002 Public, Coed
## 29 53.322 16.032 8116 Private, Coed
## 30 51.828 18.156 51123 Private, Coed
## 31 42.486 15.673 25057 Public, Coed
## 32 35.169 11.190 29911 Public, Coed
## 33 53.926 15.938 11648 Private, Coed
## 34 55.395 15.440 5722 Private, Coed
## 35 33.020 14.596 29376 Public, Coed
## 36 28.658 10.120 52669 Public, Coed
## 37 55.464 14.478 13996 Private, Coed
## 38 44.701 12.236 8740 Public, Coed
## 39 42.074 13.733 35772 Public, Coed
## 40 53.948 15.720 33355 Private, Coed
## 41 49.042 15.190 11824 Private, Coed
## 42 51.387 16.880 21489 Private, Coed
## 43 54.820 15.190 11248 Private, Coed
## 44 53.932 15.320 7710 Private, Coed
## 45 30.404 10.038 37606 Public, Coed
## 46 32.568 11.308 48216 Public, Coed
## 47 53.880 15.260 7633 Private, Coed
## 48 37.480 10.804 51525 Public, Coed
## 49 36.805 11.114 43820 Public, Coed
## 50 53.458 14.020 10983 Private, Coed
## 51 52.930 13.600 7017 Private, Coed
## 52 51.853 15.550 22484 Private, Coed
## 53 50.226 14.108 17003 Private, Coed
## 54 30.742 12.434 59837 Public, Coed
## 55 28.804 10.030 41573 Public, Coed
## 56 31.282 12.706 49577 Public, Coed
## 57 34.858 11.570 47119 Public, Coed
## 58 54.493 16.845 11789 Private, Coed
## 59 36.898 12.798 46166 Public, Coed
## 60 50.530 14.774 6642 Private, Coed
## 61 55.230 13.850 27973 Private, Coed
## 62 38.098 12.874 27578 Public, Coed
## 63 35.216 12.429 40521 Public, Coed
## 64 5.620 7.628 34334 Private, Coed
## 65 45.730 9.170 3153 Private, Coed
## 66 36.724 10.832 24387 Public, Coed
## 67 36.636 10.436 67580 Public, Coed
## 68 21.673 10.458 41362 Public, Coed
## 69 52.248 17.969 16037 Private, Coed
## 70 52.202 15.244 6771 Private, Coed
## 71 41.963 16.407 19457 Public, Coed
## 72 34.570 13.202 30340 Public, Coed
## 73 32.052 11.050 28642 Public, Coed
## 74 30.371 10.312 51848 Public, Coed
## 75 31.304 8.408 34440 Public, Coed
## 76 48.459 14.880 13858 Private, Coed
## 77 45.542 7.800 17059 Private, Coed
## 78 24.488 15.058 17342 Public, Coed
## 79 38.584 13.169 6117 Public, Coed
## 80 28.444 11.078 34432 Public, Coed
## 81 26.934 13.698 25989 Public, Coed
## 82 46.950 12.804 10489 Private, Coed
## 83 43.500 12.250 6311 Private, Coed
## 84 39.750 10.272 50019 Public, Coed
## 85 42.879 16.000 23278 Public, Coed
## 86 49.358 12.980 8905 Private, Coed
## 87 26.756 13.895 9392 Private, Coed
## 88 35.456 10.465 43710 Public, Coed
## 89 44.048 14.480 16673 Private, Coed
## 90 41.870 12.720 11426 Private, Coed
## 91 27.758 13.723 30648 Public, Coed
## 92 34.310 12.864 22970 Public, Coed
## 93 30.609 10.450 32166 Public, Coed
## 94 47.646 13.192 7164 Private, Coed
## 95 33.577 13.031 19700 Public, Coed
## 96 37.288 14.418 35230 Public, Coed
## 97 50.556 13.005 11434 Private, Coed
## 98 48.066 14.830 11080 Private, Coed
## 99 42.516 12.462 13340 Public, Coed
## 100 49.444 15.222 4233 Private, Coed
## 101 52.002 13.890 21940 Private, Coed
## 102 44.130 13.046 15346 Private, Coed
## 103 35.478 12.963 22887 Public, Coed
## 104 31.918 13.808 11446 Public, Coed
## 105 43.996 12.290 12098 Private, Coed
## 106 18.218 16.140 2215 Public, Coed
## 107 28.426 11.566 39948 Public, Coed
## YearFounded Setting Endowment2017Millions
## 1 1746 Suburban 23400.0
## 2 1636 Urban 37100.0
## 3 1754 Urban 10000.0
## 4 1861 Urban 14800.0
## 5 1890 Urban 6600.0
## 6 1701 City 27200.0
## 7 1885 Suburban 24800.0
## 8 1838 Suburban 7900.0
## 9 1740 Urban 12200.0
## 10 1876 Urban 3700.0
## 11 1851 Suburban 7900.0
## 12 1891 Suburban 2600.0
## 13 1769 Rural 5000.0
## 14 1764 City 3200.0
## 15 1873 Urban 4100.0
## 16 1865 Rural 6500.0
## 17 1912 Urban 5800.0
## 18 1842 City 9700.0
## 19 1919 Urban 4200.0
## 20 1853 Suburban 7200.0
## 21 1836 City 7600.0
## 22 1789 Urban 1700.0
## 23 1868 City 4400.0
## 24 1880 Urban 5100.0
## 25 1900 Urban 1700.0
## 26 1819 Suburban 6300.0
## 27 1852 Suburban 1700.0
## 28 1817 City 10800.0
## 29 1834 Suburban 1200.0
## 30 1831 Urban 4100.0
## 31 1909 Suburban 332.4
## 32 1789 Suburban 2900.0
## 33 1850 Suburban 2100.0
## 34 1948 Suburban 976.9
## 35 1885 Urban 2000.0
## 36 1853 Suburban 1600.0
## 37 1863 Suburban 2300.0
## 38 1693 Suburban 874.1
## 39 1960 Urban 1400.0
## 40 1839 Urban 2000.0
## 41 1826 Urban 1800.0
## 42 1898 Urban 795.9
## 43 1834 Urban 1300.0
## 44 1937 Suburban 860.3
## 45 1785 City 1200.0
## 46 1867 City 1800.0
## 47 1824 Suburban 674.3
## 48 1883 Urban 3700.0
## 49 1848 City 3800.0
## 50 1842 Suburban 641.3
## 51 1865 City 1300.0
## 52 1870 City 1300.0
## 53 1925 Suburban 948.6
## 54 1870 Urban 4200.0
## 55 1869 City 2300.0
## 56 1766 City 985.5
## 57 1855 City 2000.0
## 58 1911 Urban 1500.0
## 59 1861 Urban 3200.0
## 60 1865 City 502.5
## 61 1821 Urban 1700.0
## 62 1881 Rural 401.3
## 63 1856 Suburban 548.7
## 64 1875 City 1700.0
## 65 1887 City 408.8
## 66 1889 Suburban 682.7
## 67 1876 City 10800.0
## 68 1851 City 639.4
## 69 1841 Urban 691.1
## 70 1870 City 183.9
## 71 1965 Suburban 188.7
## 72 1863 Suburban 323.6
## 73 1787 Urban 3900.0
## 74 1851 Urban 3300.0
## 75 1872 Rural 987.6
## 76 1893 Suburban 622.0
## 77 1845 City 1200.0
## 78 1946 Suburban 109.3
## 79 1874 Suburban 246.1
## 80 1887 City 1100.0
## 81 1957 Suburban 234.0
## 82 1873 Suburban 1500.0
## 83 1886 Urban 506.2
## 84 1855 Suburban 3100.0
## 85 1954 City 231.1
## 86 1949 Urban 503.6
## 87 1867 Urban 646.6
## 88 1820 City 1100.0
## 89 1870 City 593.5
## 90 1881 Urban 626.2
## 91 1846 Suburban 659.2
## 92 1743 Suburban 1400.0
## 93 1847 City 1400.0
## 94 1890 Urban 241.9
## 95 1809 Rural 512.4
## 96 1876 City 596.4
## 97 1864 City 711.3
## 98 1855 Urban 349.6
## 99 1791 Suburban 453.3
## 100 1896 Rural 191.1
## 101 1891 Urban 707.6
## 102 1829 Suburban 847.2
## 103 1876 City 828.5
## 104 1881 Urban 112.4
## 105 1818 Urban 1100.0
## 106 1911 City 35.9
## 107 1884 Urban 615.4
## MedianStartingSalaryOfAlumniThousands Selectivity
## 1 68.4 Most selective
## 2 66.5 Most selective
## 3 64.9 Most selective
## 4 79.8 Most selective
## 5 57.7 Most selective
## 6 63.2 Most selective
## 7 70.7 Most selective
## 8 66.2 Most selective
## 9 66.1 Most selective
## 10 63.4 Most selective
## 11 58.8 Most selective
## 12 81.0 Most selective
## 13 63.8 Most selective
## 14 60.3 Most selective
## 15 61.4 Most selective
## 16 65.0 Most selective
## 17 64.9 Most selective
## 18 62.7 Most selective
## 19 56.6 Most selective
## 20 60.3 Most selective
## 21 57.9 Most selective
## 22 57.9 Most selective
## 23 64.3 Most selective
## 24 58.1 Most selective
## 25 71.6 Most selective
## 26 59.6 Most selective
## 27 59.3 Most selective
## 28 61.9 More selective
## 29 54.1 Most selective
## 30 57.4 Most selective
## 31 53.8 Most selective
## 32 49.6 Most selective
## 33 54.6 More selective
## 34 52.9 Most selective
## 35 68.1 Most selective
## 36 52.8 Most selective
## 37 57.9 Most selective
## 38 53.4 Most selective
## 39 58.0 Most selective
## 40 55.2 Most selective
## 41 63.0 Most selective
## 42 60.1 Most selective
## 43 50.2 Most selective
## 44 52.2 More selective
## 45 49.9 More selective
## 46 59.9 More selective
## 47 68.4 More selective
## 48 56.8 More selective
## 49 53.7 More selective
## 50 61.2 Most selective
## 51 65.8 Most selective
## 52 53.6 More selective
## 53 53.4 More selective
## 54 53.1 More selective
## 55 60.0 More selective
## 56 55.6 More selective
## 57 56.7 More selective
## 58 54.9 More selective
## 59 56.9 More selective
## 60 68.8 More selective
## 61 53.3 More selective
## 62 57.2 More selective
## 63 58.2 More selective
## 64 55.8 More selective
## 65 46.1 More selective
## 66 55.4 More selective
## 67 57.9 More selective
## 68 46.4 More selective
## 69 53.1 More selective
## 70 69.0 Most selective
## 71 52.0 More selective
## 72 53.9 More selective
## 73 53.2 More selective
## 74 54.1 More selective
## 75 60.0 More selective
## 76 48.5 More selective
## 77 51.3 More selective
## 78 55.4 More selective
## 79 69.7 More selective
## 80 55.4 More selective
## 81 54.4 More selective
## 82 51.2 More selective
## 83 53.4 More selective
## 84 52.2 More selective
## 85 50.7 More selective
## 86 54.0 More selective
## 87 51.5 Selective
## 88 49.7 More selective
## 89 49.1 More selective
## 90 54.0 More selective
## 91 51.1 More selective
## 92 54.8 More selective
## 93 49.6 More selective
## 94 61.4 More selective
## 95 53.3 More selective
## 96 53.4 More selective
## 97 50.6 More selective
## 98 54.9 More selective
## 99 49.3 More selective
## 100 63.8 More selective
## 101 59.9 More selective
## 102 60.3 More selective
## 103 48.3 Selective
## 104 60.9 More selective
## 105 50.6 More selective
## 106 52.2 More selective
## 107 48.3 More selective
## Fall2017AcceptanceRate MalePercentage FourYearGraduationRate
## 1 0.06 0.51 0.89
## 2 0.05 0.52 0.84
## 3 0.06 0.52 0.88
## 4 0.07 0.54 0.85
## 5 0.09 0.51 0.88
## 6 0.07 0.50 0.87
## 7 0.05 0.50 0.75
## 8 0.10 0.50 0.88
## 9 0.09 0.49 0.86
## 10 0.12 0.48 0.88
## 11 0.09 0.50 0.84
## 12 0.08 0.55 0.79
## 13 0.10 0.51 0.88
## 14 0.09 0.46 0.86
## 15 0.11 0.49 0.86
## 16 0.13 0.48 0.85
## 17 0.16 0.53 0.83
## 18 0.19 0.53 0.90
## 19 0.16 0.43 0.75
## 20 0.16 0.46 0.88
## 21 0.22 0.40 0.82
## 22 0.16 0.44 0.90
## 23 0.17 0.47 0.76
## 24 0.16 0.48 0.77
## 25 0.22 0.51 0.76
## 26 0.27 0.45 0.88
## 27 0.15 0.49 0.87
## 28 0.27 0.50 0.77
## 29 0.28 0.46 0.84
## 30 0.28 0.43 0.75
## 31 0.33 0.46 0.70
## 32 0.24 0.41 0.84
## 33 0.34 0.50 0.77
## 34 0.34 0.41 0.83
## 35 0.23 0.62 0.39
## 36 0.42 0.44 0.68
## 37 0.32 0.47 0.88
## 38 0.36 0.42 0.85
## 39 0.34 0.51 0.55
## 40 0.25 0.40 0.81
## 41 0.33 0.55 0.66
## 42 0.27 0.49 0.00
## 43 0.21 0.41 0.73
## 44 0.40 0.41 0.77
## 45 0.54 0.43 0.63
## 46 0.62 0.55 0.70
## 47 0.43 0.68 0.61
## 48 0.36 0.47 0.58
## 49 0.54 0.49 0.61
## 50 0.36 0.47 0.87
## 51 0.25 0.55 0.76
## 52 0.47 0.46 0.70
## 53 0.36 0.48 0.72
## 54 0.48 0.52 0.59
## 55 0.57 0.57 0.51
## 56 0.58 0.50 0.60
## 57 0.50 0.53 0.67
## 58 0.49 0.50 0.71
## 59 0.46 0.47 0.65
## 60 0.48 0.64 0.82
## 61 0.41 0.40 0.73
## 62 0.48 0.50 0.70
## 63 0.44 0.53 0.67
## 64 0.52 0.51 0.23
## 65 0.56 0.39 0.77
## 66 0.47 0.51 0.59
## 67 0.70 0.52 0.54
## 68 0.49 0.44 0.63
## 69 0.46 0.42 0.74
## 70 0.44 0.70 0.42
## 71 0.51 0.50 0.55
## 72 0.57 0.50 0.67
## 73 0.60 0.49 0.65
## 74 0.50 0.47 0.64
## 75 0.70 0.57 0.63
## 76 0.29 0.38 0.76
## 77 0.39 0.41 0.60
## 78 0.40 0.51 0.73
## 79 0.56 0.71 0.55
## 80 0.51 0.55 0.50
## 81 0.42 0.53 0.53
## 82 0.41 0.41 0.69
## 83 0.63 0.53 0.76
## 84 0.72 0.49 0.52
## 85 0.57 0.46 0.53
## 86 0.50 0.46 0.70
## 87 0.41 0.31 0.43
## 88 0.76 0.51 0.63
## 89 0.71 0.34 0.69
## 90 0.89 0.47 0.59
## 91 0.57 0.57 0.57
## 92 0.60 0.43 0.73
## 93 0.86 0.47 0.54
## 94 0.54 0.69 0.32
## 95 0.68 0.50 0.67
## 96 0.80 0.56 0.45
## 97 0.58 0.47 0.65
## 98 0.66 0.38 0.67
## 99 0.67 0.42 0.62
## 100 0.66 0.70 0.58
## 101 0.79 0.50 0.00
## 102 0.57 0.67 0.28
## 103 0.83 0.46 0.52
## 104 0.61 0.74 0.28
## 105 0.64 0.40 0.66
## 106 0.52 0.53 0.00
## 107 0.57 0.47 0.45
## TotalCostThousands
## 1 62.750
## 2 67.580
## 3 73.446
## 4 67.342
## 5 73.356
## 6 69.430
## 7 67.117
## 8 71.904
## 9 71.200
## 10 69.576
## 11 71.193
## 12 67.887
## 13 70.791
## 14 70.326
## 15 66.050
## 16 70.004
## 17 61.350
## 18 68.801
## 19 57.285
## 20 69.839
## 21 65.762
## 22 70.522
## 23 60.996
## 24 71.625
## 25 69.883
## 26 60.481
## 27 70.942
## 28 60.884
## 29 69.354
## 30 69.984
## 31 58.159
## 32 46.359
## 33 69.864
## 34 70.835
## 35 47.616
## 36 38.778
## 37 69.942
## 38 56.937
## 39 55.807
## 40 69.668
## 41 64.232
## 42 68.267
## 43 70.010
## 44 69.252
## 45 40.442
## 46 43.876
## 47 69.140
## 48 48.284
## 49 47.919
## 50 67.478
## 51 66.530
## 52 67.403
## 53 64.334
## 54 43.176
## 55 38.834
## 56 43.988
## 57 46.428
## 58 71.338
## 59 49.696
## 60 65.304
## 61 69.080
## 62 50.972
## 63 47.645
## 64 13.248
## 65 54.900
## 66 47.556
## 67 47.072
## 68 32.131
## 69 70.217
## 70 67.446
## 71 58.370
## 72 47.772
## 73 43.102
## 74 40.683
## 75 39.712
## 76 63.339
## 77 53.342
## 78 39.546
## 79 51.753
## 80 39.522
## 81 40.632
## 82 59.754
## 83 55.750
## 84 50.022
## 85 58.879
## 86 62.338
## 87 40.651
## 88 45.921
## 89 58.528
## 90 54.590
## 91 41.481
## 92 47.174
## 93 41.059
## 94 60.838
## 95 46.608
## 96 51.706
## 97 63.561
## 98 62.896
## 99 54.978
## 100 64.666
## 101 65.892
## 102 57.176
## 103 48.441
## 104 45.726
## 105 56.286
## 106 34.358
## 107 39.992
We plot the data in order to visualize relationships among the attributes.
#Starting Salary
#-histograms
library(ggplot2)
plot_1 <- formatted_college_tab %>%
ggplot(aes(MedianStartingSalaryOfAlumniThousands)) +
geom_histogram()+
labs(title="Starting Salary Distribution", x="Median Starting Salary of Alumni (Thousands)", y="Count")
plot_1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The distribution of the median starting salary of alumni from all the school seems to be a bell-shaped curve (a little skewed right), centering around $55,000.
#Tuition Cost
#-histograms
library(ggplot2)
plot_2 <- formatted_college_tab %>%
ggplot(aes(TuitionFeesThousands)) +
geom_histogram()+
labs(title="Tuition Cost Distribution", x="Tuition Cost (Thousands)", y="Count")
plot_2
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The distribution of tution costs of all the schools is skewed left, with a range of $60,000.
#Acceptance rate vs graduation rate
library(ggplot2)
plot_3 <- formatted_college_tab %>%
ggplot(aes(x=Fall2017AcceptanceRate, y=FourYearGraduationRate)) +
geom_point()+
geom_smooth(method=lm)+
labs(title="Acceptance Vs. Graduation Rate", x="Fall 2017 Acceptance Rate", y="Four Year Graduation Rate")
plot_3
There is a linear relationship between acceptance rate (Fall 2017) and the four year graduation rate. It is an overall negative relationship. The higher the acceptance rate, the lower the rate of graduation.
#Boxplots of (1) gradruattion rate & (2) admission rate by selectivity
library(ggplot2)
formatted_college_tab$Selectivity <- factor(formatted_college_tab$Selectivity, c("Selective","More selective","Most selective"))
plot_4 <- formatted_college_tab %>%
ggplot(aes(x=Selectivity, y=FourYearGraduationRate)) +
geom_boxplot()+
labs(title="Graduation Rate based on Selectivity", x="Selectivity Level", y="Four Year Graduation Rate")
plot_4
This is significant difference in four year graduation rates based on their Selectivity Level of accepting students. These boxplots show that each 3 selectivity level vary significantly on range and central tendency. The more selective a college is, the greater their graduation rates seem to be.
#Setting vs. room board
library(ggplot2)
formatted_college_tab$Setting <- factor(formatted_college_tab$Setting, c("Rural","Suburban","Urban", "City"))
plot_5 <- formatted_college_tab %>%
ggplot(aes(x=Setting, y=RoomBoardThousands)) +
geom_boxplot()+
labs(title="Setting vs. Room & Board Costs", x="Setting", y="Room & Board Costs (Thousands)")
plot_5
The boxplots of room & board costs based on setting shows that the setting of the college has some influence the room and board costs for the students. The median room and board costs of the City settingvary from that of the others. The spread is also greater for the City setting while it is much smaller for the rural setting.
plot_6 <- formatted_college_tab %>%
ggplot(aes(x=TotalCostThousands, y=MedianStartingSalaryOfAlumniThousands)) +
geom_point()+
geom_smooth(method=lm)+
labs(title="Total Cost vs. Median Starting Salary", x="Total Cost (Thousand)", y="Median Starting Salary Of Alumni (Thousands)")
plot_6
There appears to be a positive linear relationship between median starting salary and total cost of colleges. The general trends shows that the more students spend on tution, room, and board, the more likely that their starting salary is higher.
plot_7 <- formatted_college_tab %>%
ggplot(aes(x=SchoolType, y=MedianStartingSalaryOfAlumniThousands
)) +
geom_boxplot()+
labs(title="Median Starting Salary Of Alumni Based on School Type ", x="School Type", y="Median Starting Salary Of Alumni (Thousands)")
plot_7
Between school types, private colleges seem to have greater starting salaries than public schools, based on the medians of these boxplots.
formatted_college_tab %>% group_by(Selectivity) %>%
summarise(n())
## # A tibble: 3 x 2
## Selectivity `n()`
## <fct> <int>
## 1 Selective 2
## 2 More selective 61
## 3 Most selective 44
plot_8 <- formatted_college_tab %>%
ggplot(aes(x=MalePercentage, y=MedianStartingSalaryOfAlumniThousands
)) +
geom_point()+
geom_smooth(method=lm)+
labs(title="Male Percentage vs. Median Starting Salary of Alumni ", x="Male Percentage", y="Median Starting Salary Of Alumni (Thousands)")
plot_8
Although the points are scattered with some variation, there is a general positive correlation between median starting salary of alumni and the male percentage of the student body of colleges.
#adjusting dataset to remove variables not able to be used in model fitting
college_info <- formatted_college_tab[,-c(1,2)]
head(college_info)
## TuitionFeesThousands RoomBoardThousands TotalEnrollment SchoolType
## 1 47.140 15.610 8273 Private, Coed
## 2 50.420 17.160 20604 Private, Coed
## 3 59.430 14.016 25968 Private, Coed
## 4 51.832 15.510 11466 Private, Coed
## 5 57.006 16.350 13736 Private, Coed
## 6 53.430 16.000 12974 Private, Coed
## YearFounded Setting Endowment2017Millions
## 1 1746 Suburban 23400
## 2 1636 Urban 37100
## 3 1754 Urban 10000
## 4 1861 Urban 14800
## 5 1890 Urban 6600
## 6 1701 City 27200
## MedianStartingSalaryOfAlumniThousands Selectivity
## 1 68.4 Most selective
## 2 66.5 Most selective
## 3 64.9 Most selective
## 4 79.8 Most selective
## 5 57.7 Most selective
## 6 63.2 Most selective
## Fall2017AcceptanceRate MalePercentage FourYearGraduationRate
## 1 0.06 0.51 0.89
## 2 0.05 0.52 0.84
## 3 0.06 0.52 0.88
## 4 0.07 0.54 0.85
## 5 0.09 0.51 0.88
## 6 0.07 0.50 0.87
## TotalCostThousands
## 1 62.750
## 2 67.580
## 3 73.446
## 4 67.342
## 5 73.356
## 6 69.430
college_info$FourYearGraduationRate <- college_info$FourYearGraduationRate*100
college_info$MalePercentage <- college_info$MalePercentage*100
college_info$Fall2017AcceptanceRate <- college_info$Fall2017AcceptanceRate*100
#linear model fitting
tuition_lm_1 <- lm(TuitionFeesThousands~.-RoomBoardThousands-TotalCostThousands, data = college_info)
summary(tuition_lm_1)
##
## Call:
## lm(formula = TuitionFeesThousands ~ . - RoomBoardThousands -
## TotalCostThousands, data = college_info)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.860 -2.743 0.146 3.237 14.500
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 8.311e+00 2.691e+01 0.309
## TotalEnrollment -3.283e-05 5.921e-05 -0.555
## SchoolTypePublic, Coed -1.150e+01 1.909e+00 -6.021
## YearFounded 4.876e-03 1.305e-02 0.374
## SettingSuburban 9.797e-01 2.825e+00 0.347
## SettingUrban 1.483e+00 2.899e+00 0.512
## SettingCity -2.112e-01 2.878e+00 -0.073
## Endowment2017Millions -5.211e-05 1.483e-04 -0.351
## MedianStartingSalaryOfAlumniThousands 1.396e-01 1.705e-01 0.819
## SelectivityMore selective 6.810e+00 4.740e+00 1.437
## SelectivityMost selective 1.136e+01 5.175e+00 2.194
## Fall2017AcceptanceRate 5.185e-02 5.897e-02 0.879
## MalePercentage 2.208e-02 1.319e-01 0.167
## FourYearGraduationRate 1.754e-01 4.529e-02 3.873
## Pr(>|t|)
## (Intercept) 0.7581
## TotalEnrollment 0.5806
## SchoolTypePublic, Coed 3.42e-08 ***
## YearFounded 0.7095
## SettingSuburban 0.7295
## SettingUrban 0.6100
## SettingCity 0.9417
## Endowment2017Millions 0.7262
## MedianStartingSalaryOfAlumniThousands 0.4151
## SelectivityMore selective 0.1541
## SelectivityMost selective 0.0307 *
## Fall2017AcceptanceRate 0.3815
## MalePercentage 0.8674
## FourYearGraduationRate 0.0002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.182 on 93 degrees of freedom
## Multiple R-squared: 0.7003, Adjusted R-squared: 0.6584
## F-statistic: 16.72 on 13 and 93 DF, p-value: < 2.2e-16
plot(tuition_lm_1)
tuition_lm_2 <- step(tuition_lm_1, direction = "both", steps = 1000, trace = F)
summary(tuition_lm_2)
##
## Call:
## lm(formula = TuitionFeesThousands ~ SchoolType + Selectivity +
## FourYearGraduationRate, data = college_info)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.663 -2.870 0.402 3.025 14.015
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.51347 4.69313 6.502 2.98e-09 ***
## SchoolTypePublic, Coed -12.40552 1.29191 -9.602 6.18e-16 ***
## SelectivityMore selective 7.47358 4.37232 1.709 0.090437 .
## SelectivityMost selective 11.51048 4.51421 2.550 0.012265 *
## FourYearGraduationRate 0.14329 0.03637 3.940 0.000149 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.049 on 102 degrees of freedom
## Multiple R-squared: 0.6854, Adjusted R-squared: 0.6731
## F-statistic: 55.55 on 4 and 102 DF, p-value: < 2.2e-16
plot(tuition_lm_2)
anova(tuition_lm_2,tuition_lm_1, test="Chisq")
## Analysis of Variance Table
##
## Model 1: TuitionFeesThousands ~ SchoolType + Selectivity + FourYearGraduationRate
## Model 2: TuitionFeesThousands ~ (RoomBoardThousands + TotalEnrollment +
## SchoolType + YearFounded + Setting + Endowment2017Millions +
## MedianStartingSalaryOfAlumniThousands + Selectivity + Fall2017AcceptanceRate +
## MalePercentage + FourYearGraduationRate + TotalCostThousands) -
## RoomBoardThousands - TotalCostThousands
## Res.Df RSS Df Sum of Sq Pr(>Chi)
## 1 102 3731.7
## 2 93 3554.7 9 176.99 0.8653
#linear model fitting
gradrate_lm_1 <- lm(MedianStartingSalaryOfAlumniThousands~.-TuitionFeesThousands-RoomBoardThousands, data = na.omit(college_info))
summary(gradrate_lm_1)
##
## Call:
## lm(formula = MedianStartingSalaryOfAlumniThousands ~ . - TuitionFeesThousands -
## RoomBoardThousands, data = na.omit(college_info))
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.3062 -2.3889 -0.2445 1.8739 15.6698
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.371e+01 1.616e+01 1.468 0.1455
## TotalEnrollment 8.453e-06 3.614e-05 0.234 0.8156
## SchoolTypePublic, Coed -1.430e+00 1.325e+00 -1.079 0.2833
## YearFounded 5.017e-03 7.928e-03 0.633 0.5284
## SettingSuburban -1.682e+00 1.707e+00 -0.986 0.3269
## SettingUrban -1.043e+00 1.760e+00 -0.593 0.5547
## SettingCity -1.384e+00 1.741e+00 -0.795 0.4286
## Endowment2017Millions 2.166e-04 8.718e-05 2.485 0.0147 *
## SelectivityMore selective -2.567e+00 2.885e+00 -0.890 0.3759
## SelectivityMost selective -8.552e-02 3.207e+00 -0.027 0.9788
## Fall2017AcceptanceRate -8.235e-02 3.478e-02 -2.368 0.0200 *
## MalePercentage 5.486e-01 5.637e-02 9.732 7.55e-16 ***
## FourYearGraduationRate 1.843e-02 2.907e-02 0.634 0.5276
## TotalCostThousands 3.458e-02 5.496e-02 0.629 0.5307
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.752 on 93 degrees of freedom
## Multiple R-squared: 0.7248, Adjusted R-squared: 0.6864
## F-statistic: 18.85 on 13 and 93 DF, p-value: < 2.2e-16
plot(gradrate_lm_1)
gradrate_lm_2 <- step(gradrate_lm_1, direction = "both", steps = 1000, trace = F)
summary(gradrate_lm_2)
##
## Call:
## lm(formula = MedianStartingSalaryOfAlumniThousands ~ SchoolType +
## Endowment2017Millions + Selectivity + Fall2017AcceptanceRate +
## MalePercentage, data = na.omit(college_info))
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.5908 -2.3830 -0.3075 1.8991 15.2461
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.545e+01 3.638e+00 9.744 3.6e-16 ***
## SchoolTypePublic, Coed -1.912e+00 8.010e-01 -2.386 0.01889 *
## Endowment2017Millions 1.998e-04 7.306e-05 2.735 0.00738 **
## SelectivityMore selective -2.042e+00 2.713e+00 -0.752 0.45357
## SelectivityMost selective 6.553e-01 2.981e+00 0.220 0.82643
## Fall2017AcceptanceRate -9.100e-02 3.173e-02 -2.868 0.00504 **
## MalePercentage 5.428e-01 4.828e-02 11.243 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.672 on 100 degrees of freedom
## Multiple R-squared: 0.7166, Adjusted R-squared: 0.6996
## F-statistic: 42.15 on 6 and 100 DF, p-value: < 2.2e-16
plot(gradrate_lm_2)
anova(gradrate_lm_2,gradrate_lm_1, test="Chisq")
## Analysis of Variance Table
##
## Model 1: MedianStartingSalaryOfAlumniThousands ~ SchoolType + Endowment2017Millions +
## Selectivity + Fall2017AcceptanceRate + MalePercentage
## Model 2: MedianStartingSalaryOfAlumniThousands ~ (TuitionFeesThousands +
## RoomBoardThousands + TotalEnrollment + SchoolType + YearFounded +
## Setting + Endowment2017Millions + Selectivity + Fall2017AcceptanceRate +
## MalePercentage + FourYearGraduationRate + TotalCostThousands) -
## TuitionFeesThousands - RoomBoardThousands
## Res.Df RSS Df Sum of Sq Pr(>Chi)
## 1 100 1348.5
## 2 93 1309.3 7 39.15 0.9045
Being aware of all these factors in succeeding in college is very important when deciding where to go.
References: -College Ranking Data: https://www.usnews.com/best-colleges/rankings/national-universities
Being aware of all these factors in succeeding in college is very important when deciding where to go.
References: -College Ranking Data: https://www.usnews.com/best-colleges/rankings/national-universities